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Chapter 2 Differentiation 














Now that we have a good understanding of the Euclidean space R”, we are ready to discuss 
the concept of differentiation in multivariable calculus. We are going to deal with functions 
defined on one Euclidean space with values in another Euclidean space. We shall use the 
shorthand notation 

















to describe such a situation. This means that f can be written in the form 


f(@) = (fi(@), fala), ---5fm()) 





where x = (21, %2,...,2%n) and each coordinate function f; is a real-valued function on R”. In 
these situations it may be that f is not defined on all of R”, but we’ll continue with the above 
shorthand. 

There are two important special cases: n = 1 and m = 1, respectively. We shall quickly 
see that the case n = 1 is much, much simpler than all other cases. We shall also learn that 
the case m = 1 already contains almost all the interesting mathematics that we investigate — 
the generalization to m > 1 will prove to be very easy indeed. 





A. Functions of one real variable (n = 1) 


In the situation 





RAR” 














we shall typically denote the real numbers in the domain of f by the letter t, and the points 
in R™ in the usual manner by x = (21, %2,...,%m). AS we mentioned above, we can represent 
f in terms of its coordinate functions: 





Ff) = (A), fo), --- f(t) - 


This formula displays the vector f(t) in terms of its coordinates, so that the function f can 
be regarded as comprised of m real-valued functions f,, fo,..., fm. We often like to think of 
real-valued functions in terms of their graphs, but when m > 1 this viewpoint seems somewhat 
cumbersome. A more useful way to think of f in these higher dimensions is to imagine the 
points f(t) “plotted” in R™ with regard to the independent variable t. In case f(t) depends 
continuously on t, the points f(t) then form some sort of continuous curve in R™: 
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ie 


f(t) ~ 


We have placed arrows on our picture to indicate the direction of increasing t. Thus the 
points f(t) form a sort of “curve” (whatever that may mean) in R”. 

We need to understand well the definition of limit as t — tp and/or continuity at to. As 
we are somehow interested in the size of f(t) — f(to), we can merely use the definition of 
continuity in the case of real-valued functions, modified so that instead of absolute value we 
use the norm. Thus we have the 








DEFINITION. Let R 4 R™. Then f is continuous at to if for each € > 0 there exists 6 > 0 
such that 








lt — to] < 6 > IF) — Fto)Il <e. 




















PROBLEM 2-1. Let R 4, R™ and let L € R™. Write out the correct definition of 














lim f(tp=—L. 


t—to ,tAto 


Then prove that f is continuous at tp) => 


lim, f(t) = f(to). 


t—to ,tAto 


In preparation for the important characterization of continuity by means of the coordinate 
functions, work the following 











PROBLEM 2-2. Prove that for all x € R™ 





l< < 
max |e] < ||2|| < Vm max |e]. 

















THEOREM. Let R-4 R™. Then f is continuous at to <=> all the coordinate functions f,, 
fo,---)fm are continuous at to. 
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PROOF. =>: Let « > 0. Then the continuity of f guarantees that there exists 6 > 0 such 
that 
lt — to] < 6 => |[f@) — fto)|| < € 


By Problem 2-2, conclude that 
for each 7. Thus each f; is continuous at to. 


PROBLEM 2-3. Write out in careful detail the proof of the other half (<=) of the 
theorem. 


QED 

Though this result reduces continuity to that of real-valued functions, we prefer the original 

definition in terms of the norm of f(t) — f(to). For that definition is more “geometric” and 
does not involve the coordinates of R” at all. 

We shall always assume that R , R™ is at least continuous and defined on an interval in 


R, which could be all of R itself. 


























DEFINITION. A curve in R™ is a continuous function f from an interval [a,b] into R™. 
The independent variable a < t < b is sometimes called a parameter for the curve. 
Here are some specific examples: 





e Straight line: f(t) =x + tv, where z,v € R", v £0. 


e Unit circle in R?: f(t) = (cost, sint). 





e Unit circle in R*: f(t) = (cost, —sint). 














e Helix in R®: f(t) = (cost, sint,t). 


e A curve in R*: f(t) = (¢?,t?). 





REMARK. Our definition of “curve” is perhaps somewhat unusual. Normally we think of 
a curve in R™ as some sort of subset of IR” which has a continuous one-dimensional shape. 
This would correspond to the set which is the «mage of f in our actual definition, 


{f(t) |a st < }}. 


However, it seems best to keep our definition, which provides the extra information of a 
parameter t for the curve. It might be better to call the function f a parametrized curve, but 
that just seems too cumbersome. 
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Now we turn to the basic definition which introduces calculus in this context. 








DEFINITION. Let R 4 R™. Then f is differentiable at to if the following limit exists: 
df f(t) — f (to) 
; a 





(to) = —(to) = i 
f'(to) 7) ae 7 





In terms of coordinates for R™ the result is like that for continuity, namely, f is differentiable 
<=> all the coordinate functions are differentiable. Moreover, 


d (dir on 
Gti Fm) = (F..... B). 


We also say f is differentiable if it is differentiable at to for every tg. This coordinatewise 
calculation of the derivative f’(t)) is valid because of the corresponding theorem we proved 
above. 

A helpful way to visualize f’(to) is to draw the “arrow” from 0 to f’(to). In drawing this 
picture we like to position the vector f’(to) so that its “tail” is at f(t): 


f(to) * FU) 


We then employ the phrase tangent vector at t = to. This nice geometrical picture is 
connected with the “finite” picture of the secant vector 


f(t) — f(to) . 
t= 


f(t) 
f(t) 


Differentiation 


Thus the tangent vector at t = to is the limit of the secant vector as t — to. 


EXAMPLE. f(t) = (cos §,sin $). Here f’(0) = (0, §). (1, 0) 





R? described as 











PROBLEM 2-4. Consider the circle in 


f(t) = (a, + rcosat, ag+rsinat). 


This is a parametrization of what we have denoted by S(a,r) in Section 1F. Here a 4 0 


is a real constant. Prove that 


f(t)e (Ff) — a) =0. 


What is the geometrical interpretation of this result? 


EXAMPLE. f(t) = (2,03). Here f’(—1) = (—2,3). 





ad, -l) 





Notice that in this example we have f’(0) = (0,0). In a sense this explains why the image 
R? has a nonsmooth appearance at the origin though the curve f is differentiable. 





in | 
PROBLEM 2-5. Let f(t) = (t?,t|t|) for -oo < t < oo. Show that f is differentiable 


and sketch its image in R?. 
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PROBLEM 2-6. Consider the “figure 8 curve” given as the set of points (x,y) € R? 
which satisfy the equation 
(a? + y2)? = a? — y?. 


. Sketch this set reasonably accurately (you might use the corresponding polar coor- 
dinate equation r? = cos 26). 


. Show that the curve 


cost sint cost 
t)= : , O<t< 2z, 
Ht) ( ——) ~ 





is a parametrization with the feature that f(s) = f(t) > s—=tors=4,t= 


— 3m y_ 
$=, t= >: 


. From (b) we have f (3) = f (%) = (0,0). Show that 


a. 


(5) 
io 


. Conclude that f is differentiable at every t, and that it provides two distinct tangent 
vectors at the geometric point (0,0) on the original figure 8 curve. 


PROBLEM 2-7. Sketch the set of points in R? described by the equation 
y? = x*(x+1). Show that the curve 


fo®="-1,8-2), —-0 <t<o, 


gives all points on the given set. Indicate on your sketch the three tangent vectors f’(0), 


f"(1), and f"(—1). 





We close this section with the following useful observation. There is a useful theorem in 
single-variable calculus which asserts that differentiability — > continuity. The same result is 
valid in the present context, and the same proof applies: 
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THEOREM. /fR LR” is differentiable at to, then f is continuous at to. 














PROOF. We use the fact that the limit of a product equals the product of the limits. Thus 


lim (f(#) — f(to)) = tim (¢ — tp) =) 
= lim(t—to) jim KO Heo) 
t—to toto =r 
= Of'(t) 
0. 


QED 


B. Lengths of curves 




















We continue with our discussion of the special case R +, R™. We assume that f is 
differentiable. 


KINEMATIC TERMINOLOGY. We often think of the independent variable t as time. 
Then the derivative gives the important quantities 


f'(t) = velocity (vector) of the curve at time f, 


| f'(t)|| = speed of the curve at time t. 
In terms of the coordinate functions, the speed is 
VG Ey ei 


Taking our cue from the basic fact that distance = speed x time, we next define the length 














of the curve [a, }] -, R™ to be the following: 





DEFINITION. Assume that the curve [a, 0] , R™ is differentiable. Then its length is the 


definite integral 
b 
feos 
provided that this integral exists. 


We are not going to discuss in any detail the issue of the existence of this integral. The 
quantity || f’(¢)|| might not be a continuous function of t, and in fact it might happen that the 
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integral assigns the value oo to the length. Examples of this behavior are easily found. Here 
is one: 





PROBLEM 2-8. Let f be the curve in R? defined by 











f(t) = (Act? smi") ior Oe 1, 
~ ) (0,0) ior 4=0, 


a. Prove that f is differentiable (even at t = 0). 


b. Prove that 


IF) || = 2V0? + t-?. 
c. Prove that 


1 
[leat = 20. 
0 
(HINT: use a lower bound for the integrand.) 


(Incidentally, the existence of the integral has nothing to do with whether we can evaluate 
it in closed form. In fact, lengths of curves are usually very difficult to calculate, because of 
the square root involved in computing || f’(¢)|].) 


REMARKS. Most curves that actually arise in calculus are piecewise continuously differen- 
tiable. This means that there is a partition of the parameter interval a < to <ty <-:-<th=b 
such that f is differentiable on each closed interval |t;-1,t;] and f’ is a continuous function 
there. Strictly speaking, f itself is not necessarily differentiable at the points t;, but this 
causes no problem with computing the above integral. Thus we think of curves which may 
have “corners,” but for which the tangent vectors have limits as we approach the corners (but 
the limits may be different from one another as t — t; from t < t; or from t > t;). The formula 
for length in this case is then given as the sum of the lengths of the pieces, 


k t; 
If) |lat. 


This is really the same as the original definition in view of the fact that the integral from a 
to b is independent of the values the integrand takes (or does not have) at the finitely many 
points th, 2 erie ste. 
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EXAMPLE. A circle of radius R is described parametrically by 
J@) = (Koeost, Asint), Ot < on. 


The velocity is f’(t) = (—Rsint, Rcost) and the speed is therefore || f’(t)|| = R. Thus the 


length of this circle is 
2a 
| Rdt = 20k. 
0 


There is a related way to view the definition of length of a curve. Namely, think of a 
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“polygon” “inscribed” in the given curve. Such a polygon may be defined by choosing a 
partition a = tp < t) <--- < t, = b of the parameter interval and using the line segments 
[f (tit), f(ti)], 1 <i <k, to approximate the arc. Then the length of this polygon is 


1S WF a) =F Ga) | 


By the way, the polygon is an example of the piecewise continuously differentiable curves we 
discussed above, and Problem 2—10 below shows that the sum given here is indeed its length. 


f(b) 


f(a) 


Since f is differentiable, we know that the norm 


lf (és) — F(t) 
ti — ti-1 





is as close as we please to || f’(t;-1)||, provided t; — t;_1 is sufficiently small. Thus the length 
\| f (t;) —f (ti_1)|| is very well approximated by || f’(t;_1)||(¢;—t;_1). Thus we expect the Riemann 
sum 


» Il F’ (te) | (te — te-1) 
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to be a good approximation to the length of the polygon. On the other hand, the length of 
the polygon should be a good approximation of the length of the curve, so that the Riemann 
sum should be a good approximation of the length. 

All of the above can be made rigorous, if needed, with some moderate hypothesis on f. 
However, as we are giving a definition of length, we choose not to pause to give the proof. 

In fact, the idea of looking at inscribed polygons can be made into a definition of length 
which doesn’t even mention the derivative at all. Suppose f : [a,b] — R™ is any curve (still 
required to be continuous). Then we can define 





length of f = sup{L}, 


where {LZ} stands for the set of numbers formed by all possible lengths L of polygons inscribed 
in f. It could happen that the length of f is 00; in case it is finite we say that f is rectifiable. 
It is then a theorem that if f is piecewise continuously differentiable, then f is rectifiable and 
the two definitions of length produce the same number. 


PROBLEM 2-9. Find the length of the curve f(t) = (t?,¢°) for -1<t <0. 


[Answer: 18) 


PROBLEM 2-10. The curve f(t) = x+t(y— 2x), 0 < t < 1, represents the line 
segment from x to y. Check that its length is ||y — 2]. 











PROBLEM 2-11. Find the length of the helix in R® given by 





f(t) = (Roost, Rsint, at), 0 <i 2a. 











PROBLEM 2-12. Find the length of the parabolic arch in R? described by f(t) = 
(t,a7—t?), -a<t<a. 














PROBLEM 2-13. Find the length of the exponential curve in R? given as f(t) = (t, e’), 
O34 = kL 
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PROBLEM 2-14. Find the length of the logarithmic curve in R? given as f(t) = 
(logt,t),1<t<e. 


PROBLEM 2-15. A curve in R? called a cycloid is described by f(t) = (¢—sint, 1 — 
cost),0<t< 2z. Find its length. 


PROBLEM 2-16. A hypocycloid in R? is described as the set of all points (2, y) 
satisfying «2/3 + y?/° = 1. Draw a sketch of this set. Define the associated curve by 


f(t) = (cost, sin’ t) for 0 < t < 27, and compute its length. 


PROBLEM 2-17. A curve in R® is described by f(t) = (¢ — sint, 1 — cost, 4sin 3t). 
Show that its speed is 2. 


PROBLEM 2-18. The preceding problem is an example of a curve invented just so 
its length can be calculated easily. Choose the constant a just right to render the length 
of the curve f(t) = (t, t?, at?) easily computable. 





PROBLEM 2-19. _ Find the length of the catenary described as f(t) = (t,cosht), 
—a<t<a. 


It is quite important to realize and exploit the fact that the length of a curve is unchanged 
if a reasonable change of the independent variable is made. We now explain and prove this 
feature. We suppose that [a, }] 4, R™ is the curve, soa <t <b. Wealso suppose that another 
“parameter” s is to be used, c < s < d. And we suppose these are related by a differentiable 
function y, so that t = y(s). We further suppose that y is increasing: 





12 Chapter 2 














Then we have, strictly speaking, a different curve g given by the composition 


g=fog; that is, g(s) = f((s)). 


Now we begin to compute the length of g. We first notice the consequence of the chain rule, 


} 
vector vector scalar 


This is a consequence of the chain rule of single-variable calculus, and is proved by simply 
writing down the corresponding equation for each coordinate function of g. (Notice that we 
have written vector times scalar on the right side — it’s the same product as the usual scalar 
times vector.) Since vy’ > 0, the norms are related by 


IIgs) = IF @G))lle"(s)- 


Thus we have 
d 
fore i l(s)|lds 
d 
=f Irie oes 


t=¢(s) : 
gs / Lf" (@llat 
= length of f. 
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In case the orientation is reversed, so that y’ < 0, we get the same result: 








- lo’(s)[lds = : lF’(e(s))II(-o'(s))as 
- i “We @ll(at) = f Wrote 


Because of this invariance, if we deal with a set of points in R™ that is clearly equal 
to the image of some curve in a one-to-one fashion, we say that its length is the length of 
the corresponding curve. Thus, we have no qualms about saying the length of the circle 
S(a,r) C R? is 27r, even though we haven’t displayed a parametrization of S(a,r). 

For instance, the length of a graph y = F(x), a < x < b, in R? is given by the usual 
formula 



































y 
b 2 
dF 
1 — ) dz. 
a) 
a b : 
This is seen by using the parametrization 


There are various calculations we need to perform with this derivative of functions of one 
real variable. The chain rule, which we have already seen, is quite important. Other important 
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ones are versions of the product rule: 


Scalar Times Vector: (hf)’=hf' +h’'f, 
Vector Dot Vector: (feg)'=feg'+fleg. 


PROBLEM 2-20. Prove the two versions of the product rule. 











PROBLEM 2-21. Suppose a curve in R™ lies on a sphere. That is, f(t) € S(a,r) for 
all t. Prove that the velocity f’(t) is tangent to the sphere and the acceleration vector 
satisfies (f(t) — a) e f”(t) < 0. What is the kinematic interpretation of that inequality? 





A special case of the latter version of the product rule is frequently of great use: for a 





curve in R™ 


d 2 __ ef’ 
SIlFP = 2F « f' 











A nice kinematic fact follows from this. If a curve R 4 R™ has an acceleration f” which 
exists, and if it has constant speed, then its acceleration is orthogonal to the curve. The proof 
is easy: we apply the above formula to f’ rather than f and use the fact that || f’||? is constant. 


Thus 


d 11;2 
0= =I 
=oO7' of", 


That is, f” is orthogonal to the tangent vector f’. 








PROBLEM 2-22. Let R 4 R” be a differentiable curve and assume « € R™ is a 
point which is not “on” the curve. Suppose f (tg) is a point on the curve which is closest 


to x: that is, 














Ifo) —21 SF —2] forall tga 
Prove that 


f(to) —2 is orthogonal to f(t). 
Aas TA 
ee a 
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PROBLEM 2-23. Consider the parabola y = x? in R?. Let —oo < a < o and find 
the point(s) on the parabola which is (are) closest to (0, a). 
[Careful: you should discover two cases.] 











PROBLEM 2-24. For any number 0 < a < 27 define the curve f, in R* by 





f.(t) = (cost, sint,cos(t+a),sin({+a)), O<t< 2z. 


a. Show that for each fixed a the image {f,(t)|0 < t < 27} is a circle C, which lies in 
a certain two-dimensional plane in R’. 














b. Show that each C, has center 0 and radius 2. 
c. Show that if a 4 b, then CNC; = @. 


C. Directional derivatives 


As we have just observed, it is very easy to develop calculus for vector-valued functions of 
one real variable. We now turn to the much more intriguing situation of functions of several 











variables, IR” 4, R” with n > 1. For our first look at this situation we shall set things up to 
use the n = 1 case in a significant way. 
Though we are facing a situation here that we may never have seen, something significant 





comes to mind. Namely, we could view the function values f(21,7%2,...,%n) as depending on 
the single real variable x; if we just regard all the other independent variables x1,..., 2-1, 
Vit1,...,Xp as fixed. Then we can perform “ordinary” differentiation with respect to x;. The 
result of this differentiation could be denoted in the usual way as 

df 

dx; ; 


but the universally accepted and time-honored notation is instead 


Of 
Ox; : 





Here then is the actual 


DEFINITION. Let R" 4 R™, and let 1 <i <n. Then the partial derivative of f with 
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respect to x; is 








Of asf T Bias 2 0 Pig i ep Ew) — f(x) 
= lim 
Ox; t—0 t 
(provided the limit exists). 
Notice that if R" 5 R™, then Of /Oxz; is a vector in R™. 














EXAMPLES. 
sole) = me 
a eS = yCos x: 
2 osinn = gin z: 
Oy 
sla") = ya 
5" = o£” loge. 


So we now have the concept of “partial” differentiation as being “ordinary” differentiation 
in coordinate directions. So far, so good, but we can do much better. After all, why be 
restricted to coordinate directions only? Why not investigate all directions in R"? We now 
explore this vast generalization, which will indeed free us from a coordinate system entirely. 

Assume x € R” is fixed, and assume f is defined at least in a neighborhood of x, say a 
small closed ball B(x, r). 

We then consider a vector h € R” which will serve as a “direction.” This means we look at 
the line through x in that direction, parametrized as the set of points x + th, —oo < t < oo. 
We then restrict attention to the behavior of f on this line. That is, we consider the function 
of t given as f(a +th). This function is defined at least for all sufficiently small |t]. We then 
compute the t-derivative of this function at t = 0, if it exists. 




















DEFINITION. The directional derivative of f at x in the direction h is 
th) — 
— ae fle + th) - Flo) 


t=0 t—0 t 





d 
HI + th) 


We shall use the notation 
Df (x;h) 
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for this limit. Notice that if R" 4 R™, then the directional derivative D f(z; h) € R™. 











We stress that we have restricted our attention to an arbitrary straight line through x and 
have thus been enabled to use differentiation for a function of a single real variable. Notice 
that if h = 0 then we are not dealing with a straight line at all, but instead f(a + t0) = f(z) 
is constant with respect to t and our definition yields 


OF (a: 0) =0. 


PROBLEM 2-25. Prove that for any scalar a, 
Dfigeah) = abi (ayn). 
(Notice that this result implies Df (x;0) = 0.) 
Directional derivatives are not very interesting for functions of a single real variable, as 


all the “directions” just lie on R. The directional derivative is just a scalar multiple of the 
ordinary derivative f’: 

















PROBLEM 2-26. In the special case of R 4, R”, show that for any h € R 


Df (a;h) = hf'(). 


























LR be given by 














PROBLEM 2-27. Let R’ 





f (a1, 2) = (41 + Z2)e™2-”. 


Calculate the directional derivatives 


Df ((1,1);h) = 3h — he, 
Df ((1,0);h) = eh. 
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PROBLEM 2-28. Let R? 4 R? be given by 
f(r, 9) = (rcos6,rsin @). 


Calculate 
Df ((1,4); A) = (hy cos @ — hg sin 6, hy sin 6 + hz cos A). 


PROBLEM 2-29. Let R? 4 R? be given by 


oa (V2? + y?, arctan *) : 


Calculate 
Df ((1,0);h) =f. 


PROBLEM 2-30. Let R” 4 R be given by 
f(z) =uexr+a, 
where u € R” and a € R are constants. Calculate 


Df(a;h) =ueh. 





The next two problems give directional derivative versions of the product rule. 


PROBLEM 2-31. Suppose R” 4 R™ and R” : R. Prove that 


D(gf)(x;h) = g(a) Df (a; h) + Dg(a;h) f(a). 
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PROBLEM 2-32. Suppose R" 4 R™ and R"  R™. Prove that 


D(ge f)(x;h) = g(x) ¢ Df(x;h) + Dg(a;h) e f(z). 





There is also a version of the chain rule: 


PROBLEM 2-33. Suppose that R” 4,R 4, R and that g is a differentiable function. 
The composite function go f is defined by the equation go f(x) = g(f(x)). Prove that 


D(go f)(2;h) = g'(f(x)) Df (a; h). 


PROBLEM 2-34. Let f(x) = ||z||?. Calculate 
DiGi h)=22 eh. 


PROBLEM 2-35. Combine the two preceding problems to show that for any real 
number @ and any x € R” with x £0, 


D((\lal|*)(a; h) = alla||°Pa @ h. 


(In particular, 
xceh 


D( (lal) (ah) = Tel” 
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PROBLEM 2-36. Let R” 4 R” be given by 
xv 
f(x) = 7p: 
||? 
Calculate : baad 
xe 
Df (a;h) = —> - om 
lal? Well 


PARTIAL DERIVATIVES. Still working in the context of directional derivatives, we fre- 
quently pay special attention to the unit coordinate directions e1, €2,...,€n. Here each vector 


e; € R” is given by 





e; = (0,...,0,1,0,...,0), 


where the single “1” appears in the i‘ position. 


PROBLEM 2-37. Show that 


Of 


Di (ae) = Ban 


We also frequently use a special notation: 


Of 
Ox; 
The notation D; f(x) has the advantage over Of /Ox; of not having to name the coordinates a 
special way. We just have to keep track of the order in which they are written. For instance, 
if f(m,p,a) = am?p?, then D2 f = 3am?p. 

Still another useful special notation represents partial derivatives with subscripts, so that 


_ Of 





Df (a) = Di (ace) = 





PROBLEM 2-38. For the function of Problem 2—29 show that 


Of | y z 
Oy —— aes |: 





Differentiation 21 


Incidentally, in physics and engineering a special notation for the unit coordinate vectors 
in R? is in vogue: 














i = (1,0,0), 
j = (0,1, 0), 
k = (0,0, 1). 


In fact, in physics special attention is paid to unit vectors (vectors with norm 1), in 
that each such vector is dubbed with a circumflex (*). Thus it is correct in R* to write 
= (5, a 5 —3) as h, whereas it is incorrect in R? to write h = (1,1) as h. We shall often 
employ this notation. Thus if you see a symbol Df (ae h), you are assured that the norm 
[[Al] = 1. 








D. Pathology 


“It is good for me that I have been afflicted, that I might learn thy statutes” 
Psalm 11971 
The purpose of this entire section is to present some examples of functions which have 
directional derivatives with certain strange properties. (Such examples are often called “coun- 
terexamples.”) The reason for doing this is not my own love for the perverse, but rather 
to make sure we fully appreciate the tremendous usefulness of the concept of differentiabil- 
ity which will be discussed in the next section. These examples also serve to illustrate the 
inadequacy of the concept of directional differentiation, however appealing and useful it is. 
All our examples are going to require that n > 1, and it so happens that n = 2 gives us 
enough room for the strange behavior we want to illustrate. Therefore, in this entire section 
we deal with 


JR 





R? 








’ 





and we denote points in R? with the notation (x,y). We still denote the directions as h = 
(hi, hg). We shall also arrange things so that the pathology occurs at the origin in all cases. 

Before going on, notice that all the examples and problems in Section C had the feature 
that the directional derivatives were linear functions of h. We’ll have much more to say about 
this in the next section, but for now we just note that a linear function of h is a function of 
the form cyh, + coh2, where c, and cy are constants. 


QUESTION 1. Is it possible that a continuous function have a directional deriva- 
tive which is a nonlinear function of the direction? 


ANSWER. Yes! 
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EXAMPLE. , 
f(x,y) = (23 + y¥)". 


Continuity is clear. And it is easily seen that 
Dit) = (ni!° + ny?) ... anonlinear function of h. 
Incidentally, notice for example that 
Df (0,8); kh) = 9hy + ahs is a linear function of h. 


Here’s another: 


0 (2,4) =(0,.0). 


Continuity is clear except at the origin. But notice that 


f(x,y) = i if (x,y) # (0,9), 


x*|y| 
= < 
f(x, y)| Pry = lyl, 





so f is continuous at 0. (Also, |f(z,y)| < $]2]-) 
Now we compute 








Phihe 0 
f(th) — f(O) _ Baer — 
t 7 t 
he 
hi + hy 
This doesn’t even depend on ft, so certainly for h 4 0 
Di Gh) = fil a nonlinear function of h 
aan Pe ; 


QUESTION 2. Is it possible that a function have directional derivatives in every 
direction and not be continous? 


ANSWER. Yes! 
EXAMPLE. 
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PROBLEM 2-39. Show that f is discontinuous at the origin. Show that Df(0;h) 
exists for all h and is given by 


h2/hg if hy #0, 


ee i ee 


One might think that perhaps the trouble with the preceding example is that the directional 
derivative is nonlinear. Here’s a somewhat more sophisticated counterexample. 


QUESTION 3. Is it possible that a function have directional derivative equal to 
zero in every direction and not be continous? 


ANSWER. Yes! 


EXAMPLE. 
Let 


fears aa for (x,y) #0, 
0 tor (vy) =O. 


PROBLEM 2-40. Verify that f satisfies the conditions we have asserted for it. 


MORAL. Unlike single-variable calculus, existence of directional derivatives does not imply 
continuity of the function. More subtly, the directional derivative is not necessarily a linear 
function of the direction. We shall soon discover how wonderful it is for the directional 
derivative to depend linearly on the direction, so we shall incorporate this property into the 
definition in the following section. 


E. Differentiability of real-valued functions 


At last we turn to the actual definition we shall employ. First, we need the important 
definition of linearity. We shall discuss this thoroughly in Section I, but for now we give a 











PROVISIONAL DEFINITION. A linear function from R” to R is a function L of the 
form 
L(h) = chy tere t Clin 


where the numbers c;,...,¢, are constants. Assembling the coefficients c,,..., Cp as the coor- 
dinates of a vector c € R”, we can use our scalar product notation to write LZ in the form 


L(h) =ceh. 
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This probably doesn’t quite agree with your usual terminology for linear functions. Here 
are two graphs of functions from R to R: 











not linear linear 


ae 


A function from R to R of the form f(x) = ax + b is said to be affine, and is therefore 
linear <> b = 0. More generally, a function from R” to R of the form f(h) = ceh+d 
is said to be affine. Thus every linear function is affine, and an affine function f is linear 
> (= 0. 

The next thing is to remind ourselves of the definition of differentiability of a function from 
R to R. It is that the limit 









































on tm ty) ~ fe) 
y0 y 





exists. Rewrite: there exists a number c such that 


fm LOOT) — F(a) = ey _ 
y—0 y 


0. 





Here’s a slight modification. Replace the denominator by |y| (since the limit is 0, this doesn’t 
change the definition): 


linear function of y 
lim fe +9) — f2) - ye" 5 
y—0 | y| 


(To repeat: replacing the denominator y by its absolute value does not change the value 0 
of the limit.) 

This is perfect! Notice the linear function in the numerator! Now that we have transformed 
the definition cleverly, we can immediately generalize to functions on R”, as follows. 
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*/ DEFINITION. |}*® Let x © R” be a fixed point. Assume R” , R is defined at 
least in a neighborhood of z (a ball B(z,r)). Then f is differentiable at x if there 
exists a vector c € R” such that 


fle+y)-f@)-cey_, 





























lim 
y0 lly ll 


REMARK. This definition is absolutely crucial. Notice the vast difference between it and 
the concept of “directional” derivative. There is nothing “directional” in this definition — the 
variable point y tends to 0 in norm (which is true <= all coordinates of y tend to 0) with no 
restriction on its direction. 


REMARK. There is a nice geometric interpretation of this definition. In the case of a 
function from R to R, the existence of f’(x) has the familiar “tangent line” interpretation, 
namely that the graph of f near the point x looks affine on the microscopic scale. That is, 
f(x) + f'(x)y is a very good approximation to f(« + y) for small y. For instance, here are 
three sketches of the graph of x — x? near x = 0: 
































! 
1/2 1/4 1/16 


The same sort of thing is true in our case: the affine function of y given by f(x) +ceyisa 
very good approximation to the function f(x + y) for small ||y||. The difference between the 
given function and the affine function tends to zero as ||y|| — 0, and it does so at a faster rate 
than |ly|| itself: the quotient of the two even tends to zero. 

Here’s an easy fact: 


if f is differentiable at x, then the directional derivatives exist at x. 


Let y be restricted to have the form th in order to prove this (assuming (h 4 0). Then the 
differentiability of f at x implies that 


0 = tm 2 tt) — fz) — ce th 
+40 lé] [A 
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Multiply by the number ||h|| and multiply by |t|/t: 


f(a+th) — f(x) -—tceh 








0 =lm 
t—0 t 
=m Het =F) 5 


Thus, Df(xz;h) =ceh. In particular, if h = é;, then the 7“ component of ¢ is 


_ of 





Ci 


(Thus c is uniquely determined by f.) 


DEFINITION. The vector c is called the gradient of f at x and is written with two different 


notations: 3 3 
(grad f)(2) = (VA)(e) = (se... 34). 
XY Ln 


Thus we have shown that if f is differentiable at x, then 











Di geh) = Vie) ese. 








In particular, notice the very pleasant situation that Df(x;h) is a linear function of h. 
The notations we have chosen for the gradient are quite standard. The symbol V, an 
upside down delta, is called del. It also has the rather obsolete name nabla, and its properties 
are still sometimes called the nabla calculus. 
We can now easily state some expected calculus formulas for the gradient. We assume that 
f and g are real-valued functions defined on subsets of IR”. Then we have 





VG Toa Viar ve 
Vi(af) =aVf_ ifaé€R is constant; 
Vf=0 if f is constant; 
Vifg) =fV9+9VE. 


We do not treat the chain rule at the present time, as we shall discuss it thoroughly in 
Section K. 














PROBLEM 2-41. Prove the above formulas. In addition, state and prove the corre- 
sponding quotient rule. 
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PROBLEM 2-42. 


a. Let f be an affine function: f(x) =cex+d. Show that f is differentiable at any 
e, end (Via) = ¢ 


b. Let f be the quadratic function: f(x) = ||z||?. Show that f is differentiable at any 
g, and (Vf)\(e) = 22. Thus 


V(cex+d)=c, 
V((|2||") = 2a. 


(Solution: (a) f(a+y)— f(z) -cey=ce(a#+y)+d—cex—d—cey=0. Thus 


im 27 +) — F@) —cey _ 9 
y0 lly 





(b) f(@+y) = |le+ yl? = lel]? + 2a ey + |lyll? = f(x) + 20 ey + |lyl|’. Thus 


f(@+y) — f(a) — 2x ey 
lll 





= |lyll 0 as y 0. 


Thus (Vj )\(2) =22:) 











We now quickly show that just as in the special case of Section B (R — R"), differentia- 
bility => continuity: 


THEOREM. [If f is differentiable at x, then f is continuous at x. 
PROOF. This is tres simple: consider the two limits, 


er Act) ea AD al = 0, 
yi lull 





lim [ly = 0. 

Multiply them, using the fact that the limit of a product is the product of the limits: 
lim (f(w +Yy) F(a) = cey) = 0. 
y 

But of course ce y has limit zero, so we conclude 


lim (f(2 +y)=F@)) =0; 
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eae 


lim f(a + y) = f(z). 
QED 


We are surely elated that in the case of differentiability, D f(a; h) is indeed a linear function 
of h. However, the converse is not valid, as the counterexample for Question 3 had Df(0;h) = 0 
for all h (thus linear) and yet f was not even continuous at 0; and we now know that f could 
certainly therefore not be differentiable at 0. 

You may say, “Aha! Suppose we assume that Df (x; h) exists and is a linear function of h 
and f is continuous at x. Then perhaps f is differentiable at x”: 


QUESTION 4. Is it possible that a continuous function with Df(0;h) = 0 for all h 
not be differentiable at 0? 


ANSWER. Yes! 
Here’s an example: 


PROBLEM 2-43. Let 


AK 
“— 
‘Ss 

= 
" 


7 ae for (x,y) 
f(@,y) = f id for (x,y) = (0,0) 


a. Show that f is continuous at (0,0). 
b. Show that Df(0;h) = 0 for all h. 
c. Show that f is not differentiable at 0. 


(The hard part is c. Here’s a proof by contradiction. If f were differentiable at 0, show 
that it would follow that V (0) = 0. Conclude that 


ACTED 


a 
(2,y)0 ./a? + y? 


Now show that this is not true.) 


Therefore our necessary conditions that f be differentiable at x: (1) f is continuous at 
x and (2) Df(x;h) is a linear function of h, turn out not to be sufficient to ensure the 
differentiability. 
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PROBLEM 2-44. As might be expected, the fourth calculus formula given above, the 
product rule, is more complicated to prove than the other easy calculus rules. Here is a 
lemma which will be useful in giving a proof: show that if f is differentiable at x, then 
there exists a constant C’ such that 


f(a +y)— fl@)l < Cllyll 











for all sufficiently small y € R”. 





The above inequality is called a Lipschitz condition for f at the point x. It is named for 
the German mathematician Rudolf Otto Sigismund Lipschitz. Notice that the inequality gives 
another proof of the continuity of a differentiable function. 


Now it is easy to prove the product rule: 


PROBLEM 2-45. Prove that if f and g are differentiable at x, then the product fg 
is differentiable at x, and 


V(fa)(a) = fle)(Va)(2) + g@)\(VF)(@). 


(HINT: write f = fi + f(z) and g = g, + g(x) (a is a fixed point). Express fg as a sum 
of four products and use the preceding problem to show that V(figi)(x) = 0.) 
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PROBLEM 2-46. What is wrong with this “proof” for the preceding problem? 
a. We know 


Vi/(2) = (Of /0%1,..+,0f /O@,), 
b. We also know that 3 3 af 
pF 








c. Therefore, 


V(f9) = (O(fg)/0r1,...,A(fg)/O2n) 
= (fdg/0z, + gOf /0x1,...) 
= f(0g/0x1,...) + 9(0f /Ox1,...) 
= [V9+9Vf. 


QED? 


We have now made significant progress in our understanding of multivariable calculus, 
thanks to the all-important definition of differentiability. In the next section we shall learn 
how it can be quite useful. 


F. Sufficient condition for differentiability 


The property of differentiability is absolutely crucial in multivariable calculus. The defini- 
tion is so technically involved that it is hard to verify directly. Happily, there is a sufficient 
condition that is extremely useful, and handles almost all cases that we ever need. We now 
state and prove it. 





*, THEOREM. |} Assume R” -, R is defined in a neighborhood of x, and assume 


the partial derivatives 0f/07,,...,0f/0Ox, all exist in this neighborhood. Further- 
more, assume these partial derivatives are all continuous at x. Then / is differen- 
tiable at x. 




















PROOF. The main tool we shall employ in this proof is the famous mean value theorem of 
single-variable calculus. This theorem asserts (under the proper hypothesis) that for —co < 
a<b<oo 

g(b) — g(a) = g'(e)(b— a), 


where c is some point in the interval a < c < b. 


Differentiation 31 


To verify the differentiability of f at x, we need to compare f(#+y) and f(x). We do this 
by moving from x to «+ y along n “steps” taken one coordinate at a time (a “taxicab” trip). 
Since there is nothing essentially different between the cases R” and R?, I am content for the 
sake of brevity to perform the proof for the case n = 2 only. 


We use the notation c = V f(x) = (Of /0x1, Of /Ox2). (Of course, the hypothesis guarantees 
the existence of these partial derivatives.) Then for any € > 0 there exists 6 > 0 such that 


























of 
Ox; 





Ily|| < 6 => (a + y) — G |< €/2; 


this is where we use the hypothesis of continuity of Of /Ox; at x. 


Now we write in two “steps” 


faery) =fle) =f aus we) = f Oe we) | + Pei ee ee) = fee) | 





(L,,L,+ Y,) 
(@,+2,X4 Y,) (Xt Y,; rt Y,) 
(2,,0,+ Ww) 
(i .) 


This is perfectly arranged to use the mean value theorem on each term to produce for any 
llyll < 6, 


of 
Ox, 





flaet+y)—flz)= 


(a1 + 2,22 + yo) yi (11,02 + W) Yo, 


Ox 
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where z is between 0 and y; and w is between 0 and yo. Therefore, 


Fes —7@)—eegl= E 





Ox, 








O 
(xy | Z,X2 { Y2) = a Y1 + ES (21, Le + w) _ a Y2 
v2 


= Eig eiaa| 
ate 9 192 


< Helly| + <ellyl 
= 5Elly gly 





= ellyll- 
Therefore, 
0 < |y| <o—> Hew fen eey Ze 
Thus we have proved that 
lim ety) —f@)—cey _» 





a lly 


Therefore, f is differentiable at x (and c= Vf(z)). 
QED 


The theorem we have just proved is quite wonderful, as is its proof. This is just the 
sort of mathematical proof that essentially works itself. Once we decided to use the partial 
derivatives, then the diagram in the body of the proof suggests itself, as does the use of the 
mean value theorem. 

Whereas the next problem holds no interest for calculus that I am aware of, working 
through it may enhance your understanding of the above important proof. It is an “improve- 
ment” of the theorem to be sure, as it provides the same conclusion with weaker hypotheses. 




















PROBLEM 2-47. Assume R” % R is defined ina neighborhood of «, and assume the 
partial derivatives Of /O21,...,0f /Ox,_, all exist in this neighborhood and are continuous 
at x. Assume also that Of /Ox, exists at x. Prove that f is differentiable at x. 

(HINT: limit attention to n = 2 and apply the mean value theorem only to the term 
f(ai + y1, £2 + y2) — f (x1, 22 + y2) in the above proof.) 


It so happens that in practice an even weaker theorem than the one we have proved is 
definitive in an amazing variety of situations: 





COROLLARY.| Assume R” % R is defined in an open ball B(z,r), and that 


the partial derivatives Of /Ox,,...,0f/0Ox, all exist and are continuous functions in 
B(a,r). Then f is differentiable at every point of B(z,r). 
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This result is an immediate corollary of the theorem. The reason it is so useful is that we 
can very often tell at a glance that a function has these continuous partial derivatives, and 
thus it must be differentiable. For instance, 


f (#1, £2, £3, 24) = log (a; +25 +28+ en) + sin x3 cos(sin £1) 





gives a function f that clearly satisfies the conditions on all of R*; for all we need do is the 
mental exercise of thinking about how to compute the four partial derivatives Of /Ox;. For 
that we use the full power of one-variable calculus — the chain rule, the product rule etc. 
— and realize that the formulas we thereby obtain give continuous functions on R*. The 
only possible “trouble” that could arise in this particular example would occur because the 
argument of log might be 0. As this argument is clearly always positive, that cannot happen. 
We conclude that each Of /Ox; is continuous and thus that f is differentiable at every point 
of R*. 








EXAMPLE. Here’s a reworking of Problem 2-27: 
f(@1, ©2) = (@1 + @2)e™ 


We see at a glance that f satisfies the conditions of the corollary. If we want the directional 
derivative at any x € R?, we can use the formula on p. 2-26: 





0 
oe; = (1+ a1 + a2)er™, 
0 
on = (1 ~ 21 — aa)en™. 


Thus 
OE gti h) => [(1 +24, + L2) hy + (1 —%1- £2) ho] et #2, 


No difference quotients needed! No need to substitute « + th and compute the t-derivative 
and substitute t = 0. It’s just algebra! 


DEFINITION. A function which satisfies the hypothesis of the above corollary is said to be 
continuously differentiable. It is also said to be of class C!. (A continuous function is of class 


Cc.) 
Here we give an example which illustrates the power of this corollary. Consider the function 
given as the norm, f(x) = ||z|| for c € R", x £0. Since f(x) is the square root of the positive 





quantity 2? +---+ 22, the chain rule of single-variable calculus makes it clear that all the 
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partial derivatives Of /Ox; are continuous for x 4 0; thus f is of class C' for x 4 0. We already 
have from Problem 2-35 the directional derivative 


creh 


Dileh) = Tel 


Since f is differentiable (by the corollary), 
Vif(a)eh=Df(a;h). 


Therefore, 





=e for all h € R”. 








Vi (a) eh 





[lel 
We can now conclude that V f(«) = x/||z||. (See Problem 1-11.) We record this result in the 
form 





x 


V|lzll = lz: 











PROBLEM 2-48. Show that for any real number a 
V||2||% = alla||° 22 for ao 0, 
For which a is this equation also valid for x = 0? 


There are two common misconceptions concerning differentiability. One is the idea that 
our sufficient condition is also necessary. The following example shows this not to be the case 
even for single-variable calculus: 

















PROBLEM 2-49. Define R > R by 


a) x? if x is irrational, 
1G = 
0 if x is rational. 


Prove the following: 
a. f is continuous only at 0. 
b. f is differentiable at 0, and f’(0) = 0. 


c. f is differentiable only at 0. 
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The second misconception is that the mere existence of the partial derivatives in a ball 
automatically implies their continuity. This again is not true even for single-variable calculus. 
Here is an example found in almost all calculus texts: 




















PROBLEM 2-50. Define R > R by 


x” sin + for 2 6 0, 
0 for 2 =). 


Prove that 











a. f is differentiable on all of R; 





b. f/(0) = 0; 


c. f’(a) is not a continuous function of x at x = 0. 


G. A first look at critical points 


We'll eventually present critical points in great depth, and in fact won’t finish the discussion 
until Chapter 4, but already we are able to discuss the concept to some extent. 


DEFINITION. Let R” % R be differentiable at x. Then z is called a critical point of f if 


Vie =o. 

















This terminology should already be familiar to you from single-variable calculus. Notice 
that the condition for x to be a critical point can be expressed in terms of partial derivatives: 


Of 
Ox; 





(2)=U0. for L<a <n, 


Another important concept: 





DEFINITION. Let R” 4 R be defined on a set A C R” and let Xo € A. We say that f has 
a global maximum at xo if 

















f(x) < f(@o) forall ce A. 


We say that f has a local maximum at xo if there exists r > 0 such that 


f(x) < f(xo) for all c € AN B(az9,7r). 
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(Notice that a global maximum is certainly a local maximum, but not conversely.) We define 
similarly global minimum and local minimum. We say that f has a local extreme value at xo 
if it has a local maximum or a local minimum at Zo. 

As in single-variable calculus, we have the easy 








THEOREM. Let R" En R be differentiable at xo and also have a local extreme value at xo. 
Then xo is a critical point of f. The converse statement does not necessarily hold, even if 
n=l. 








PROOF. We are content to handle the case of a local minimum. Then for any h € R”, 
f(ao + th) > f(xo) for all sufficiently small |t). 
Therefore for some € > 0 


f (xo + th) — f(x) is ee for 0<t<e, 





t <0 for -—e<t<0O. 
Now let t — 0 to achieve both the inequalities 


Df(to3h) >0 and <0. 











Therefore, Df(xo;h) = 0 for all directions h € R”. In particular, Of/Ox; = 0 at xo for 
1<ic<n. Thus 2p is a critical point. For the statement about a converse, most people’s 
favorite example is f(x) = x? for c € R. 








QED 





EXAMPLE. If R? 5 R is given by f(x,y) = xy’, then Vf = (y?,2xry). Thus every point of 
the form (x, 0) is a critical point. Notice that critical points do not have to be isolated. 








EXAMPLE. Suppose we want to find all the critical points of the function given as f(x,y) = 
5a2y+ary?+15xry. Then we compute the two partial derivatives to get the two scalar equations 


l0xy+y?+15y =0, 
5a? +2ry+15¢ =0. 


There are two things to notice before proceeding. Namely, we have two equations for two 
unknowns (x and y) and they are nonlinear. This is the usual situation, and usually it will be 
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difficult or impossible to solve the equations explicitly. However, in this particular example 
it’s pretty simple. The equations can be rewritten 


y(10z2+y+15) =0, 
v(Se+2y+15) =0. 


The first equation asserts that 
y=0 or 10%+y=-15, 


and the second that 
z=0 or 5e+2y => —15. 


There are four possibilities. The least obvious one is the one in which the two affine equations 
are satisfied, and elimination gives the solution x = —1, y = —5. Thus the four critical points 
of f are 

(0,0), (0,-—15), (—3,0), (—1,—5). 





The next eleven problems give functions defined on (a subset of) R”. All of them are of 
class C', and thus are differentiable. You are to find all the critical points of each. 


PROBLEM 2-51. 
f(z, y) = 32? — 2ay + 3y? — xy’. 


PROBLEM 2-52. : : 
x y 
f(@,y) = — — 24+ — — ay. 
Yy x 


Here f is defined only for x 4 0, y £0. 


PROBLEM 2-53. 


f(a, y) = 27y? (2a + 3y — 6). 


PROBLEM 2-54. 





f(x,y) = 


Here a, b are nonzero constants. 
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PROBLEM 2-55. 


PROBLEM 2-56. 
f(z,y) =zytatty™. 


PROBLEM 2-57. 





f(z, y, 2) = cyz + ax! + byt + e271. 


Here a, b, c are nonzero constants. There are two qualitatively different cases, depending 
on the sign of abc. 


PROBLEM 2-58. 





F(z, y,2,w) = zyzw + az 1+ by 1h eg tT 4 dwt. 


Here a, b, c, d are nonzero constants. 


PROBLEM 2-59. 


PROBLEM 2-60. : 
f(a) = (ara? +--+ + ane“, 


Here ||2|| is the norm of x, and the constants satisfy a; > a2 > 
(There are 2n+1 critical points.) 


PROBLEM 2-61. 


f(z,y)=(e@t+yev7™. 





(There are two critical points.) 
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PROBLEM 2-62. The preceding problem brings up an issue. 

















a. Prove that the function R" 4 R given as the norm, f(x) = ||z||, is differentiable at 
every x #0, and is not differentiable at x = 0. 








b. Let R” J, R be defined as 

















f(x) = aye", 
Prove that f is differentiable even at x = 0, and calculate (Vf) (0). 


PROBLEM 2-63. Let a be a fixed real number and find the critical points of the 
function on R” defined by f(x) = ||x||?+21+<al|a||. (There will be three cases depending 
on what a is.) 





EXAMPLE. The critical point structure of this function will prove to be very enlightening. 
Define R?-4R by 

















f(x,y) = (x* — 1)’ + (2° — e¥)’. 


As f is given as a sum of two squares, it is clear that f(x,y) > 0. Moreover, f(x,y) = 0 <=> 
x? =1 and 2? =e. Thus f attains its global minimum value precisely at the two points 


(1,0) and (1,0). 
We know these are critical points. Let us see if there are others: 


Of /Ox = 4a(x* — 1) + 4a(x*® — e”) = 0, 
Of /Oy = —2e¥(2? — eY) = 0. 


The second of these equations of course requires z* = e”, and then the first requires 4x(x?—1) = 
0. Thus x = 0, 1, or —1. The value x = 0 is excluded by x? = e¥. We then have 1 = e¥, so 
y = 0. Thus the two points we found by inspection are the only critical points. 


MORAL. Situations can be much more complicated in two variables than in one. A differ- 
entiable function defined on an interval in R cannot have two global minima unless it has at 
least one more critical point (a local maximum), as indicated in the following sketch. But 
with two independent variables there is more room to “maneuver.” We might say that there 
doesn’t need to be a mountain peak between two lakes. 
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PROBLEM 2-64. Define R? 4 R by 








f(a, y) = 3ae¥ — «® — e*". 


a. Show that (1,0) is the only critical point. 


b. Show that (1,0) is a local maximum. (You may want to show first that 


f(x,y) < 2x3/? — 2° for all > 0, —00 < y < co.) 
c. Show that (1,0) is not a global maximum. 


d. Give the moral of this example. 


H. Geometric significance of the gradient 


In this section we return to the general situation 





R’ SR 








in which f is differentiable at the point 2. We have defined the gradient V f(x) and we know 
“algebraically” how to compute it in terms of partial derivatives, 


a a 
Vie) = (Fee ge), 


but we now want to explore the geometry which is contained in this vector. 
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First, it could be that x is a critical point of f. By definition this means that V f(x) = 
and the geometrical situation is quite clear. 


Thus we assume from now on that x is not a critical point; that is, 


Vf (x) #0. 


We want to examine thoroughly the equation (p. 2-26) which relates the directional derivative 
and the gradient, 


Df(a;h) =Vflw)eh 


To recapitulate, the definition of Df(x;h) is very geometrical in nature, and so is that of 
V f(x). We also have a way of computing Vf almost algebraically, using the coordinates of 
R” and the partial derivatives of f at «. And then computing the directional derivative really 
does become a matter of linear algebra: 





x a 


What we are going to do now is add to this situation our geometrical understanding of the 
dot product so that we emerge with a geometrical understanding of the gradient. 


Therefore we now consider the behavior of V f(x) eh as the direction of h varies. In order 
to make this significant, we shall now work exclusively with unit vectors h (see p. 2-21 for 
this notation). 











Geometrically, this means that we are actually looking at “directions” hin R”, emanating 
from x, and the directional derivative Df (x;h) is a measure of the “rate of increase of f at x 
in the direction h.” 





Let 6 denote the angle between the gradient V f(x) and the direction h. We then recall 
from p. 1-17 that 
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VFCO 
h 
0 
Df (a; h) = ||V f(a)|| cos @. x 
(Remember: ||| = 1.) This equation is very revealing. It shows that the maximum value of 


Df («;h) is the norm of the gradient |/V f(x)||, and this maximum rate of increase is realized 
<=> his the unit vector in the same direction as V f(x). 

(Also, the minimum of Df(«;h) is —||Vf(z)|], and this occurs <> h is the unit vector in 
the same direction as —V f ().) 

Not only is this useful geometric information about the directional derivative, but also it 
gives us a way to describe V f(x) in purely geometric terms, with no reference whatsoever to 
a coordinate system! Namely, still assuming 2 is not a critical point, 











V f(a) is the unique vector at x in R” determined as follows: 





e its direction is the direction of maximal increase of f at x, 


e its magnitude or norm is the rate of this maximal increase. 





This is indeed a wonderful situation, one that frequently happens in mathematics. We 
have an important quantity (in this case, the gradient of a function) which, on the one hand, 
has a completely geometric description, and which, on the other hand, can be computed in a 
coordinate system in a very routine and useful manner (in this case, as (Of /Ox1,...,0f/O%n)). 
Truly a double-edged sword! 




















Level sets. As you may realize, the graph of a function R” -, R seems to be less visually 
helpful for n > 1 than for n = 1. However, the idea of level sets of f seems quite useful. By 
definition, these are sets of the form 


{x € R"| f(x) = a}, 
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where a is any fixed real number. These sets are obviously disjoint (as a varies) and fill up all 
of the domain of f. Sketching them is clearly a different matter from sketching the graph of 
f (a subset of R"t'). They seem especially convenient when n = 2. 














EXAMPLE. Let f(x,y) = o +y?. The level sets of f are of course ellipses with center (0, 0) 
and with the same shape. (Unless a < 0, in which case the level set is empty; or a = 0, in 
which case the level set is just (0,0).) Here are rough sketches of a few level sets, where the 
numbers refer to the value f attains along that particular level set. 


























































































































I have sketched V f(z, y) at three points of the level set f(x,y) = 1. 

Notice that Vf is always orthogonal to the level sets in the above sketch. This feature is 
always true, but we need to wait for the discussion of the chain rule in Section K to see this. 
Also notice that the more tightly spaced the level sets are, the larger the gradient must be. 
This is of course because the norm of the gradient measures the rate of increase of f in that 
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direction, and when the level sets are close together, f must be changing rapidly. When you 
hike in Colorado with a survey map that shows curves of constant altitude, it’s when they are 
close together that your hike is strenuous. 


PROBLEM 2-65. Here is a rough sketch of some level sets of a certain function 
R2 SR: 


Suppose that f is differentiable at the indicated point x9. Prove that xo is a critical point 
for f. 


PROBLEM 2-66. Let R? 4 R be differentiable at Xp. Suppose that 


Gea)” 


Ae 


Calculate (Vf) (20). 
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PROBLEM 2-67. Let 


Calculate (Vf) (20). 


PROBLEM 2-68. 


x? —y? +3. Also give accurate sketches of Vf at several points in 
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R be differentiable at x9. Suppose that 





Sketch a few level sets of the function defined on 


(L111)\_, 
XO; 9°99 99 —_YV; 





111 1 
Dilan == et 
t (ai (5.55 ;)) ’ 
11 1 1 
De lee = =) 
t (ai (5.5 9” 5) ’ 
1 1 1 =é/1 
D ; = 3. 
f (0 (=. OY 2’ ;)) 











R? by f(x,y) = 








R?. 











PROBLEM 2-69. Repeat Problem 2-68 but with f(x,y) = y? — x3 — 2”. 


I. A little matrix algebra 


In the next section we are going to discuss the concept of differentiation for a function 








from R” to 


from 




















R™. The key concept we shall have to understand is the idea of a linear function 
R” to R™. The reason is that the derivative in this general case is intimately connected 


to a type of affine approximation to the given function. This is yet another instance of the 
intimate connection that calculus provides between algebraic and geometric concepts. In the 
present section we want to provide the necessary elementary algebraic structure. We first need 
to modify the provisional definition we gave in Section E: 


DEFINITION. A linear function from R” to R™ is a function R” > 


F 




















R”™ which is compatible 


with vector addition and scalar multiplication, in the sense that 

















for all x, y € 


F(a +y) = F(a) + Fy), 
Fae) = ak (x), 


R" and alla € R. 
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REMARK. Often linear functions are also called linear transformations or linear operators. 


PROBLEM 2-70. Here are some immediate consequences of the definition, showing 
further how a linear function respects the algebraic properties of R” and R™: prove that 
if F' is linear, then 


F(0) 
F(-z) =—F(2), 
F (a2 ++ ar) free ay,x'*)) = a, F (x) + ag F (2) aera a, F (x), 


PROBLEM 2-71. Prove that we could have defined linear function by requiring F’ to 
“preserve linear combinations,” in the sense that 


F (ax + by) = aF (x) + bF(y) 


for all x, y € R” and all a, DER. 


PROBLEM 2-72. Show that our provisional definition in the case R” a R, namely 


that F(x) = ce, gives linear functions. Moreover, prove that if R” *, Ris linear, then 
there exists a unique c € R” such that F(x) =cez. 


PROBLEM 2-73. Just as on p. 2-20, we need to understand the difference between 
affine and linear functions. We define R” +, R™ to be affine if 


F (ax + (1 —a)y) = aF (x) + (1—a)F(y) 


for all vectors x and y and all scalars a. Prove that F’ is affine <=> there exists a linear 
function Fo and a fixed vector w € R™ such that 


F(x) =w+Fo(x) for all z. 


Prove that Fp and w are uniquely determined by F’. 
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PROBLEM 2-74. Prove that if F' is affine, then 
F (ayx™ + az +--+ + aye) = a F(x) + agF (x) +--- + a,F (x) 
whenever a; + da) +---+a,=1. 


It is virtually impossible to provide meaningful pictorial descriptions of linear functions. 
However, there is a classic bookkeeping device for handling them, namely, the algebra of 
matrices. 


DEFINITION. An m x n matrix of real numbers is a rectangular array A having m rows 
and n columns. A standard notation for A is 


a1, =a12 Qin 

a21 422 Q2n 
A= 

AGm1 Am2 --:- Amn 


We say that the real number a,; is the entry of A in row 7 and column j. We also say that 


the i row of A is (aj, aiz... Qin) (a 1 x n matrix) 
ay; 


a2; 


and the j column of A is (an m x 1 matrix). 


mj 
We say that the matrix A has shape m x n. If we are in a context in which the shape of 


A is known, we may abbreviate 
A= (dey): 


Here are some basic algebra definitions: 


e two matrices A and B are equal <=> they have the same entries at each position (A and 
B must therefore have the same shape). 


e A+ B is defined entry-wise (A and B must therefore have the same shape), so that 
(aij) + (biz) = (aij + big); 


notice that A+ B=B+A. 
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e 0 is the m x n matrix whose entries are all 0; thus 


A+0=0+A=A. 











e If cE R, cA is defined entry-wise by 





cA = (ca;;) P 
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e Notice that the matrix —A defined as (—1)A is the unique matrix which when added to 


A gives the matrix 0. 





It is nice to observe that the set of all m x n matrices is now additively just like R™”’. Namely, 
each m xX n matrix is specified by its mn real entries, arranged in a certain pattern. The 
same is true of each vector in R””. Not only that, but also addition of matrices and scalar 








multiplication are just like they are for R™”. 


However, matrices enjoy another algebraic property that far outweighs the above in im- 
portance. That is, we can multiply them in certain situations. The precise definition is this: 


e if A= (a;;) is an m x p matrix and B = (b,;) is a p x n matrix, then the product AB is 


the m x n matrix whose ij entry is 


P 
) Qikdp;- 
k=1 


In other words, the entry of AB in row i and column 7 is obtained from the i'* row of 





A and the j*" column of B by a kind of dot product of vectors in | 
(ai, Qi2,+-- Qin) e (b1;, bo;, ee Dg) : 


(Notice the commas!). 


R?: 
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Here are a few numerical examples: 


—1 
(1 2 -—3){ 2 } =(-6); 
3 
—1 -1 -2 3 
2/1 2 -3)=(2 4 -6]; 
3 3 6 -9 


(5 2)(20 0 Sree 
Ue. eae ane 10-3 


(> 0) (5 “5) = 6 0): 


To repeat, we can multiply matrices only in this case: 


A times B = AB. 
T T T 
mXp pxn mxn 


There is an identity for this matrix multiplication. We’ll say that I is the m x m (notice 
its shape is square) matrix 


.U su. 0 
01 0 

I=]. 
OO on 2 


(The context will always determine the size of the identity matrix J.) Then 


A times J = A, 
T T T 
mxn nxn mxn 
I times A = A. 
T fi T 
mxm mxn mxn 


It is sometimes useful to employ the Kronecker delta function to work with J: by definition 


1 if @=3, 
dij = ae ah 8 
O if 744, 


50 Chapter 2 


Thus, 
L= 104); 
Here are some properties of multiplication: 
e it is associative: 
(AB)C = A(BC). 
e it is distributive: 
(A+ B)C = AC + BC, 
A(B + C)= AB+AC. 


e 0DA=AON=0 (the three “zeros” may all be different!). 


e it is not commutative: 
AB # BA in general. 


The last situation is quite interesting. In the first place, AB is defined only if # of columns 
of A = # of rows of B; and BA is defined only if # of columns of B = # of rows of A. Thus 
AB and BA are both defined only if A is m x n and B is n x m; then AB is m x m and 
BAisnxn. Thus AB = BA is possible only if m = n. Thus the only possible situation for 
AB = BA is that A and B are both square matrices of the same size. But even then they 


might not “commute”: 
LO): 70 Dy fo 1 
0 0/\0 0/ \O O}’ 
O17 Oy. 0 0 
0 0/\0 0/ \O O/° 


The latter situation also shows that we cannot in general “cancel” matrices from an equation: 


AB=074A=0 or B=0. 


PROBLEM 2-75. Show that matrix multiplication does not in general have multi- 
plicative inverses, by showing that there is no 2 x 2 matrix A such that 


AQ o)=(04) 


even though G ) is not zero. 
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PROBLEM 2-76. Suppose A is an n x n matrix which commutes with every n x n 
matrix. That is, AB = BA for every n x n matrix B. Prove that A is a scalar multiple 
of J, 


Column vectors. In presenting the general definition of derivative it is helpful to introduce a 
convention for writing linear functions from R” to R™. The convention is described as follows: 
in these situations we modify the notational scheme for points in our basic vector space R” 
by expressing them as columns rather than rows, so that we write 














y=|].1, ann x 1 matrix. 


The reason for doing this is simple. We want to have a compact expression for a linear function 
from R” to R™. If A is an m x n matrix, then the matrix product Ay is a well-defined m x 1 
matrix. Thus matrix multiplication produces the desired linear function y +> Ay. 

(If we used the row vector notation we have been following, then the corresponding linear 
function would have to send 1 x n matrices to 1 x m matrices, so it would need to be written 
in the form y + yA, where A is ann Xm matrix. As we usually prefer thinking of functions as 
operating on the left, this would go against our custom. Hence the sudden change to column 
vectors. ) 

Thus if y is an nm x 1 column vector, and A is an m x n matrix, then Ay is the column 
vector in R™ whose i* entry is 

S QikYk.- 
k=1 


As a special case, note the interesting result involving é,;, the j‘" coordinate vector for R”: 








Aé; = the j™ column of A. 


Here of course we are writing €; as a column vector: 


e, =| 41 (n x 1 matrix, 1 in position 7). 
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PROBLEM 2-77. Notice that in the above situation, Ay is certainly a linear function 
of y, in the sense of the definition at the beginning of this section. This exercise establishes 




















the converse. Namely, show that if R” +, Ris linear, then there is a unique m x n matrix 


A such that 





F(y) = Ay forall ye R”. 


I want to stress that the matrix A is not the linear function here — it’s not a function at 
all. It is just that when A multiplies vectors as above, the resulting function F is linear. 


PROBLEM 2-78. Consider this situation: 


R” & RP» R™ 


























where F' is linear and G is linear. Let the corresponding matrices be A and B, so that 








F(z)=Az forall ce R’, 
G(y) = By for all y € R”. 














Prove that the composite function Fo G is linear, and that its corresponding matrix is 
the product AB. 


REMARK. Problem 2-78 gives the actual reason for the strange-looking definition of matrix 
multiplication. 











J. Derivatives for functions R” — R™” 


In this section we are going to attain our goal of defining derivatives in general. Recall that 
in Section A we very easily took care of the case n = 1, which essentially was like one-variable 
calculus; the generalization to values of functions in R™ rather than R was quite simple. 

However, the case m = 1 and general n was quite another matter. We devoted Sections C, 
D, and E just to getting the definition of differentiability correct. 

Based on the above two paragraphs, you might expect that our current generalization from 
m = 1 to general m will be completely straightforward. This is indeed the case. We can even 
take our cue from the great definition on p. 2-24, as we recognize that the crucial form of the 
numerator, 

















f(e@+y)— f(a) -cey, 


involves the linear function of y represented there by the scalar product. Literally, all we need 
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to do now is insert the correct form of linear function of y, namely one that maps R” into R”. 
We know from Section I that such linear functions can be realized as products of the form 
Ay, where A is an m X n matrix and y € R” is written as an n x 1 matrix (a column vector). 
Here then is what we require: 























*/ DEFINITION. |}* Let x € R” be a fixed point. Assume R” +, R™ is defined at 
least in a neighborhood of z (a ball B(z,r)). Then f is differentiable at x if there 
exists an m xX n matrix A such that 


_ Ory) Fe) Ay 
y0 Ill 
































= 0. 





DISCUSSION 


1. Notice that the numerator in this definition makes good sense, as f(# + y), f(a), and 
Ay all belong to R™. 














2. This isn’t quite like p. 2-24 in case m = 1. This is only because in the former definition 
we wrote our linear function in the form of the scalar product ce y. The corresponding 
form we are now using in this case 


Y1 
Y2 
Ay =(C1 C2..-Cn) 
Yn 
1Ixn matrix nx1 matrix 


This is of course equal to the number 
Cry + CoY2 + °° + CnYns 
and in our old notation this is indeed the scalar product 
(C1, C2,---,€n) © (Yi, Yo, +++) Yn): 


3. The remark on p. 2—25 still holds in this more general case, and asserts that the affine 
function of y given by f(a) + Ay is a very good approximation to the function f(z + y) 
for small ||y]]. 
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. The proof of p. 2-27 still applies to show that if f 2s differentiable at x, then f is 


continuous at x. 


. Directional derivatives work the same way as before. If we restrict y to have the form 





th for a fixed h € R” and let t — 0, we obtain the formula 
Df(a;h) = Ah, all he R". 














. TERMINOLOGY. The matrix A is called the Jacobian matrix of f at x. This is in 


honor of the great mathematician, 
Carl Gustav Jakob Jacobi, 1804-1851. 


We shall use either notation for this matrix, 


(Df)(z) or f'(2). 


. In particular, let h = é; = the j* coordinate vector. Then we have 


of 

an, 
(see p. 2-51). That is, the n columns of Df(x) are given respectively by the n partial 
derivatives Of /0x1,...,0f/Oxn. We can embellish this observation as follows. Write 
the function f in terms of its coordinate functions (of course, arranged in a column!): 


fi(z) 
fo(x) 


(x) = Df(x)é; = j” column of Df (z) 


f(«) = 


Then the 7" column of Df (zx) is 


Of, /Ox; 
Of2/Ox; 


This gives the all-important formula for the Jacobian matrix of f in terms of partial 
derivatives, 
Of, /Ox, esis Of, /OLn 
Of/Ox, ibe Of2/OLn 
(Df)(x) = 


6) a: oe Ayfa/ Oi, 
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where of course all the partial derivatives are evaluated at x. 


8. The wonderfully useful sufficient condition for differentiability as given on p. 2-30 is 
still in force. This means that if f,,..., fm all have continuous partial derivatives with 
respect to 21,...,2n, then we have differentiability of f and we can use the formula for 
the Jacobian matrix as given above. 


9. In particular, if all the coordinate functions f; are of class C' in B(z,r), then f is 
differentiable at every point of B(z,r). 











10. In the special case of a real-valued function R” a R, we now have two distinct notions 
for the derivative. One is the gradient Vf and the other is the Jacobian matrix Df. It 
is perhaps unfortunate that these are different, but they are very similar. The Jacobian 
matrix is by defintion the 1 x n matrix 


DDFs ees ate fe 


(No commas!) On the other hand, since we are thinking of vectors in R" as column 
vectors, we should probably now write 














fey 
Vi («) = ‘ 
Seen 
There is very little danger of confusion, for the distinction in these two concepts is that 
between algebra and geometry. The directional derivative is given either way as 


Dien) =Diah (matrix product) 
and 


Dieh)= Vielen (dot product). 


EXAMPLE. Here we work out the Jacobian matrices associated with polar coordinates for 
R?. Denoting R? as the x — y plane, the formulas are 


c= "xLoss, 
y= resind. 
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We insist that 0 < r < 00, so we are leaving out the origin in R?. Of course, —oo < 6 < o, 
though a point (x,y) € R? determines @ only up to the addition of integer multiples of 27. 














In order to conform with our column vector notation, we need to write our formulas in 











: : f 
column form in terms of a function R? > R? as 


f(r.6) = (; ea . 


Then 


cos@ —rsind 
sind rcos@ }° 


Df (r,8) = ( 


(A small but important point: we have arbitrarily chosen to write the independent variables 
r, @ in the order displayed. If we had written them in the order 0, r, then the columns of our 
matrix would be interchanged.) 


The two columns of this Jacobian matrix have tremendous geometric significance. Notice 
first that they are orthogonal. We can think of f as depicting points in the x7 — y plane as 
functions of r, 0. Thus Of /Or describes how the point f(r, 0) varies with increasing r for fixed 


0: 


(x,y) 








(Notice that af |- 1s) 


On the other hand, Of /00 describes the motion of f(r,@) with increasing @ for fixed r: 
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(x,y) 








(Notice that | a |- r.) 


The function f assigns Cartesian coordinates if the polar coordinates are given. Conversely, 
we now consider the function g which assigns polar coordinates to a point given in Cartesian 


g x ? y Q ) 


where the formulas x = rcos@, y = rsin@ need to be solved for r, 6. Of course, 
r= Vx? +y?; 


as for 0, it is only determined “modulo 27.” If we use a formula like 6 = arctan(y/x), we can 
readily compute the partial derivatives as 


aly —y aly a 


Ox g2+y?’ dy gL? + yp? 





(There’s another way to find these partial derivatives. Namely, start with « = rcos@, y = 
rsin@, and compute directly. For instance, keeping y fixed and using the subscript notation 
for partial derivatives, 

1 

0 


Now eliminate the “unknown” r, from this pair of equations: 


r, cos0 — rsin6@,, 


r,sinO + rcos00, — 


(— sin 0)1 + (cos @)0 = rsin? 00, + r cos? 00.. 


Thus 
—siné =1r0,, 
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so that 
Be es =—simd .—remd . =¥ 


r r2 ge? + y? 





Likewise for 6,. This method has the advantage of avoiding any problems with arctangent 
when x = 0.) 
Thus we find the Jacobian matrix 





x ¥y 
Dg(z,y) = (ver ve) | 


ety? tty? 
In terms of the polar coordinates themselves this may be written 
cos@ sin@ 
Dg(z,y) = (Sine ca? )- 


Notice that Df(r,@) and Dg(x,y) are inverse matrices: 


Df (r.0)Dale.9) = (Sp Baer (Sine =) 


sind rcosé 
Tr Tr 


(0 1) 


I. 


(We shall see in Section K that this is an illustration of the chain rule.) 




















PROBLEM 2-79. Let R" 4 R” be the function given in Problem 2-36: 
x 
f(z) = 75. 
«||? 


Calculate the Jacobian matrix Df (é1). 


PROBLEM 2-80. For the function of the preceding problem show that 


Df (x) = |lx||-°2 — 2\|x||“(ai2;). 
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PROBLEM 2-81. For the function of the preceding problem show that for any u, 
veER" 
Df(x)we Df(x)v = |la|| “ue. 


(Because of this result, the function f is said to be conformal.) 


PROBLEM 2-82. Suppose R” , R” is itself a linear function: f(z) = Ax. Show 
that for any x € R” 
Df(z) =A, 





(In particular, if R” 4, R” is the identity function, f(x) = x for all x, then Df (x) = I.) 


PROBLEM 2-83. For any m x n matrix A = (aj;), let A‘ be its transpose, namely 
the n x m matrix 
A = (a,x). 


Show that for all x, y in the appropriate spaces 
Avey=xreA'y 


(what are the appropriate spaces?). 


PROBLEM 2-84. Let A be an n x n matrix and let R” 4 R be the corresponding 
quadratic form on R”, 
FCs prea Bie 


Show that 
Vi (2) =(A+ Az. 


(In particular, if A is symmetric, meaning A’ = A, then 


Vila) =2Az.) 
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PROBLEM 2-85. Suppose R” +, R™ and R" “+ R™ are both differentiable at x. 
Show that f+ g is also differentiable at x, and 


D(f + g)(«) = Df (a) + Dg(a). 






































PROBLEM 2-86.  Here’s another product rule. Suppose R” 4, R and R® % R™ are 
both differentiable at x. Show that fg is also differentiable at x, and 


D(fg)(«) = f(a) Dg(2) + g(a) Df (a). 
































PROBLEM 2-87. Another product rule: if R” 4, R”™ and R" 4 R™, then show that 


D(feg) = f'Dg+g'Df. 
































(Here f' and g’ refer to the transposed values, so that f'(x) and g‘(a) are 1 x m matrices 
(row vectors)). 


K. The chain rule 


The material in this section is very, very important in calculus and its applications, as the 
results are used constantly in both theory and practice. It all has to do with the basic concept 
of composition of two functions. In general, whenever two functions f and g are given and it 
makes sense that we are able to define g(f(x)), we write the resulting function go f: 


(9° f)(«) = g(f()). 


You are surely used to thinking this way, though perhaps not with this notation, from single- 
variable calculus. 
Our abstract framework will involve this situation: 





pe 2 pee 























where as usual we do not require f and g to be defined everywhere. If n = m = ¢ = 1, then 
we are in the familiar single-variable calculus situation and we certainly recognize the chain 
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rule in the form 


d 
an (IF) = FO) FC). 
Another notation you are probably familiar with is something like 
dz _ dz dy 
dx dydx 


This result emphasizes the expected result that the derivative of the composition is the product 
of the derivatives of the two involved functions. 

There is a more geometric way of thinking of this in terms of affine approximations. In 
the more general situation we are investigating, we think of the function y — f(x) + Df(x)y 
as the “best” affine approximation of f(a + y) near y = 0. Let us temporarily express this 
approximation in the following notation: 


f(at+y) = f(z) + Ay, 


where A = Df (x). Likewise, if we denote B = Dg(f(x)), then near z = 0 we have the affine 
approximation 


g(f (x) +2) = o(f(z)) + Bz. 


Therefore we definitely anticipate that we have the approximation near y = 0, 


(gof)(a+y) = 9(f(xt+y)) 
= g(f(x) + Ay) 
= g(f(x)) + BAy. 


The last expression is an affine function of y and indicates that 


D(go f(z) = BA 
= Dg(f(2)) Df (2). 


This is indeed what we shall prove. There’s a wonderful moral here: while the composition 
go f may be very difficult to compute, involving all sorts of complicated operations, the 
affine approximation of go f is very easy to compute, involving only the very basic algebra of 
multiplying matrices! 

We now state and prove what is also sometimes more accurately termed “the composite 
function theorem.” Our proof does not rely on the single-variable calculus result, but actually 
handles that as a special case. Though it’s a special case, all the essential ingredients are 
present even there. 
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THE CHAIN RULE. Assume 



































Let x be a fixed point in R”. Assume 





f is differentiable at x; 
g is differentiable at f(z). 


Then 
gof is differentiable at x 


and 








D(ge f(a) = Dg(f(a)) Df (a) | 








This formula relating the derivatives is really beautiful, containing not only the three 
Jacobian matrices, but also expressing the result in terms of the nice definition of matrix 
multiplication: 


D(gof) = Dg Df. 
xn xm mxn 
The proof is not hard at all, essentially merely using the definition of differentiability. 


PROOF. As above, we denote A = Df(x) (mxn matrix) and B = Dg(f(x)) (€xm matrix). 
The first step in the proof is going to express g as the sum of two functions, one of which is 
linear and the other of which varies so little near f(x) that it contributes zero to the derivative 
at f(a). Namely, we write g = g: + go, where 


gi(w) = Bw (a linear function), 


go(w) = g(w) — Bw. 


Notice that 


Dgi(f(«)) 
Dgo(f(«)) 


B (Problem 2-82), 
Do(f(z)) -B (Problem 2-85) 
=f5 


I 
oo 
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Now gof=g,°0f+g2°f, so again we can use Problem 2-85 once we prove that g, 0 f and 
g2° f are differentiable at x. First we have a straightforward algebra calculation 


(9° f)(2+y) — (9° f)(a) — BAy 














lim 
v0 lly 
— ym Le +y) — BF) — BAy 
a lly 
— ym PE +9) — Fe) — Ay) 
y0 ly! 
— Bim fe tw —fO — Ay 
v0 lly! 
= BO 


This proves that g; o f is differentiable at x and 
D(g, 0 f)(x) = BA. 
Now we are going to show that go 0 f is differentiable at 2 and 
D(g2 0 f)(x) = 0. 
(This will finish the proof, thanks to Problem 2-85.) That is, we are going to prove that 


ae (f(a +y)) — 92 (f(z) aij 
7-0 lvl . 





This is completely expected, thanks to the fact that Dgo(f(x)) = 0. We just need to check 
that the presence of f doesn’t cause any unwelcome surprises. 

We first claim that f satisfies a Lipschitz condition at the point x. Namely (see Problem 2— 
44), there exists a constant C’ and a positive number 6 such that 


IIF(a@ +9) — F@)Il < Cllyll 


for all ||y|| < 6. (We’ll prove this at the end.) 
Now for ||y|| <6 we consider the quotient 


IIg2 (fe +) = 92 (F(2)) Il 
lly 
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If it happens to equal zero for a particular y, then we are of course quite pleased, as we 
want to show (*) has limit zero. So we need only be concerned with those y for which ||y|| < 6 
and (*) #0. In particular, f(~+y) — f(x) £0. Let us call this latter difference w. Of course, 
w depends on y and in fact ||w|| < C|ly||. In this situation 


lIg2 F(@) +w) — 2 (F(@)) IP lel 


eu: bel “Tall 








< lg (F(x) + hep RFI o 


This last quantity has limit zero as y — 0, since also w — O and we are given that 
Dg2(f(x)) = 0. 

Finally we establish the Lipschitz condition for f. We first notice that the entries of the 
fixed matrix A are just fixed numbers, and thus there is a number Cp such that we have the 
inequality for norms, 





|| Ay|| < Colly|| for all y € R”. 


Next, the triangle inequality implies 











IF(e+y)—~F@Ml . Wfe+y)—f@)—Ayll _ lAyl 
lly 7 lly lly 
< eee eee 
y 


this sum has limit Co as y > 0, because A = D f(x). Thus it is no larger than, say, 1+ Cp = C 
for all sufficiently small y (say, ||y|| <0). 
QED 

















PROBLEM 2-88. _ In a general situation R” 4, R™ in which f is differentiable at a 
fixed point x, define the affine approximation to be the function Aff(f, xz), where 














Aff(f,x)(u) = f(x) + Df (z)(u— =) for all u € R”. 





(Why?) Then prove as a result of the chain rule 


Aff(g o f, x) = Aff(y, f(@)) 0 AMF, «). 


ILLUSTRATIONS. 
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1. Look again at the polar coordinate example on pp. 2-55 through 2-58. There we have a 
situation where f and g are inverses of one another, so that fog = the identity function 
from R? to R?. Thus D(f 0 g)(x,y) =I by Problem 2-82. The chain rule thus gives 


Df (9(z,y)) Dg(a, y) = I, 


just as we observed by explicit calculation on p. 2-58. 
































2. More generally, any time we have a situation R” 4, R” & R” in which f and g are 
inverses in the sense that go f = the identity function, then 


Dg (f(2)) Df(x) =I. 


(We say that the corresponding Jacobian matrices are inverses of one another.) This 
comes as no surprise. After all, if f and g are inverses then their affine approximations 
should also be inverses, thanks to Problem 2-88. 


3. The most important abstract situation to understand is the case n = 1, @= 1: 





RLR" SR. 











We shall in fact show in the two subsequent items that the generalization to arbitrary 
n and @ is then immediate. For this illustration we shall write generically f = f(t) and 
g=9(u) = g(m1,.-.-,Um). The chain rule then tells us immediately 


D(go f)(t) = Dg (f(t) DF). 
That is, 


It seems to help in remembering this formula to abuse the notation by writing f(t) as 
u(t). The idea is that the independent variables uz, have been replaced by functions 
ux(t) and the resulting calculus formula is 


m 


“(a (ur(t),-.-,u4m(t))) 7 ae 7 
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So you can really just use the single-variable chain rule as a pattern and just keep 
differentiating “as long as you see a t.” 


. The generalization 





R7" LR" 4R 























is immediate, as the act of computing 0/0Oz; is a matter of letting the other coordinates 
be fixed and applying number 3: 





O _< Og Our 
de (2 (u(2)>--- tml) = LOS Ber 


) 


an equation that is valid for each 7 between 1 and n. 


. The full generalization 





rR’ 5 R” 4% R! 











now follows simply by applying number 4 to each component of g, one at a time: 


m 


O 7 Og; Our 
Fa, (08 (ul )---.thal) = Do ae Bey 


valid forl<i<nand1<j <8. 





. The special case 





R°SR+4R 











is often quite useful. We have 


smo (Fa) = 9 FO) Se. 








In terms of the gradient notation, 
Vigo f) =9 F(z) VE(z). 

For instance, 
Viet) =e! VF, 
V(g(llzll)) = 9 (lel) 


Ia: 
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PROBLEM 2-89. Here is another proof that the pathological function of Problem 2— 
43 is indeed not differentiable at the origin. Calculate for that function that 


d..o 1 
qi Whe ) = 5° 


Why does this show that f is not differentiable at 0? 


PROBLEM 2-90. Define R? 4 R by 


2Iy)/4 
f(z, y) = vtty? for 
0 
Show that f is continuous on R? and that all directional derivatives Df (0;h) = 0. Then 
prove that f is not differentiable at the origin by consideration of f(t, t?). 


PROBLEM 2-91. This is a rather standard situation that frequently arises in ther- 
modynamics and other applications. Suppose R® +, R is a differentiable function whose 
first order partial derivatives are never zero. Furthermore, suppose that the equation 


Pew2)=0 


can be solved for each of the three “unknowns” as functions of the other two variables. 
For example, in this way we can regard x as a function of y and z, so with abuse of 
notation 

Fla;y,z) =0 produces: <= z(y,z). 


It then makes sense to define 0x/Oy, the partial derivative of x(y,z) with z held fixed. 
P that 
rove tha de dy Oz 


Oy Oz Ox 
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PROBLEM 2-92. Let R? 4 R? be defined by 
{(G3%3) = (2? — £4, 22429). 


Calculate the 2 x 2 Jacobian matrix Df (2). 


PROBLEM 2-93. ‘The function of the preceding problem actually comes from the 
corresponding complex function (x; + 7ix2)?. Since every complex number other than 0 
has two square roots, the equation f(x) = y should have two distinct solutions for each 
y £0 in R’. Find them explicitly. (Part of the answer is 


22: a 


PROBLEM 2-94. Let R? 4 R? be defined by 
f (1, £2) = (a? _ 32125, 32729 “= a) 


and calculate Df (x). What complex function does this resemble? 


PROBLEM 2-95. The complex exponential function e* produces through Euler’s 


formula the function R? 4 R2 given as 
F (#i,%0) = (& cos @, & sing). 


Calculate the Jacobian matrix Df (x1, x2). 
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PROBLEM 2-96. Given y 4 0 in R? one can define a complex logarithm of y, + iys 


to be any complex number 2 + ivg such that e™!+*"2 = y, + iyo. In terms of 





R? an 











d 


the function of the preceding exercise, this means that f(x) = y. Find all such points z. 


Roughly speaking, the answer is 


f= (ioe \|y||, arctan 2) 
Y1 


PROBLEM 2-97. Pretend that the formula of Problem 2-96 gives a (single-valued) 


function x = g(y). Verify directly from that and Problem 2-95 the relation 


Da(f(x))Df (a) = I. 





PROBLEM 2-98. Let R? 4 R* be defined by 

















f(a, 8) = (cosa, sina, cos 3, sin 3). 


Calculate the 4 x 2 Jacobian matrix Df(a, 3). Show that the two columns of this matrix 


are unit vectors orthogonal to one another. 


L. Confession 


“Confess your faults to one another” 
James 51° 


Although we have given a correct definition of the differentiability at x of a function 











R 4 R™, together with the Jacobian matrix Df(x), that is not quite the whole truth. The 


reason is that we have relied on the identification of linear functions with their corresponding 


matrices. 


The better definition ignores matrices entirely and just mentions the linear functions. Thus 








we can equivalently define f to be differentiable at x if there exists a linear function | 


such that 
lim 22 + ¥) — fe) — LW) 


= 0. 
y70 lly 





E 
R" >] 





R™ 
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The corresponding terminology might be this: the linear function L is called the differential 
of f at x, and is denoted 


(df)(«) = L. 


The correspondence with the Jacobian matrix Df(x) is that 





(df \a)\(y) = Df (x) y for all y € R”. 


mx mxn nxl 


Thus for example we have 


PROBLEM 2-99. 
a. If R" 4 R” is linear, show that df (x) = 


b. If R" 4 R” is affine, show that df (x) = 























— (0). 





PROBLEM 2-100. Use the notation of the statement of the chain rule as given in 
Section K. Show that 
d(go f)(x) = dg(f(x)) 0 df(z). 


What is the point of this shift in emphasis? It is that we gain in geometric insight by 
stressing the geometric object df (x) instead of the algebraic object Df (x). But not only that. 
We often have situations in which the é€;’s are not the natural basic vectors to be using and 
the given coordinates are somehow unnatural. Then the actual m x n matrix Df(x) may be 
of little interest, and we might prefer using a different m x n matrix. 

Notice also that the statement of the chain rule is more elegant in the new formulation, 
as both sides of the formula involve composition of functions. The algebra involved in matrix 
multiplication does not appear. 

In summary, we might say that the linear function df(x) is represented by the Jacobian 
matrix Df (x) in the usual coordinate systems we are using. 


M. Homogeneous functions and Euler’s formula 


In this section we are concerned with functions R” I, R which are defined on all of R” 


except the origin. Let a be a fixed real number. 
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DEFINITION. The function is homogeneous of degree a if 
f(tz) =t°f(z) forall O0<t<oo andall x € R” — {0}. (x) 


Assume from now on that f is of class Ct on the set R” — {0}. 


PROBLEM 2-101. Prove that if f is homogeneous of degree a, then the partial 
derivatives Of /Ox; are homogeneous of degree a — 1. 


PROBLEM 2-102. For fixed x ¥ 0 differentiate the equation (*) with respect to t. 
Then set t = 1 and conclude that the Fuler equation is satisfied: 


(+*) 


PROBLEM 2-103. Conversely, assume that the Euler equation (**) is satisfied and 
then prove that f is homogeneous of degree a. (HINT: £(t~*f(tz)).) 


PROBLEM 2-104. Assume f is a polynomial which is homogeneous of degree a, 
where a is of course a nonnegative integer. Establish the Euler equation for f by explicitly 
calculating what happens for the individual monomials 


a a 
Ly ey... 2n", Aytagt+---+a, =a. 


PROBLEM 2-105. Suppose f is homogeneous of degree 1, and define f(0) = 0. 
Assume that f is differentiable at 0. Prove that 


f(x) = VF(0) ex. 


PROBLEM 2-106. Give an example of a function which is homogeneous of degree 1 
and continuous on R” and not differentiable at 0. 





